Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 710 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 50.0 KiB |
| Average record size in memory | 72.2 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 6 |
| Categorical | 2 |
HML is highly correlated with Mkt_RF and 5 other fields | High correlation |
CMA is highly correlated with Mkt_RF and 1 other fields | High correlation |
Mkt_RF is highly correlated with SMB and 4 other fields | High correlation |
SMB is highly correlated with Mkt_RF and 2 other fields | High correlation |
RMW is highly correlated with SMB and 1 other fields | High correlation |
Best is highly correlated with Mkt_RF and 2 other fields | High correlation |
Worst is highly correlated with Mkt_RF and 2 other fields | High correlation |
Date has unique values | Unique |
RF has 69 (9.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-11 04:47:49.656335 |
|---|---|
| Analysis finished | 2022-10-11 04:47:51.701924 |
| Duration | 2.05 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 710 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| Minimum | 1963-07-01 00:00:00 |
|---|---|
| Maximum | 2022-08-01 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 566 |
|---|---|
| Distinct (%) | 79.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.004547425339 |
| Minimum | -0.2644865148 |
|---|---|
| Maximum | 0.1492817027 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 285 |
| Negative (%) | 40.1% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.2644865148 |
|---|---|
| 5-th percentile | -0.07512775864 |
| Q1 | -0.01987113062 |
| median | 0.009108391137 |
| Q3 | 0.03343477609 |
| 95-th percentile | 0.06849466003 |
| Maximum | 0.1492817027 |
| Range | 0.4137682176 |
| Interquartile range (IQR) | 0.0533059067 |
Descriptive statistics
| Standard deviation | 0.04521785727 |
|---|---|
| Coefficient of variation (CV) | 9.943617301 |
| Kurtosis | 2.621816834 |
| Mean | 0.004547425339 |
| Median Absolute Deviation (MAD) | 0.02676332638 |
| Skewness | -0.7615005447 |
| Sum | 3.228671991 |
| Variance | 0.002044654616 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.01389610519 | 3 | 0.4% |
| -0.0145046862 | 3 | 0.4% |
| 0.01024731645 | 3 | 0.4% |
| 0.01390290517 | 3 | 0.4% |
| 0.007769737264 | 3 | 0.4% |
| -0.02316627803 | 3 | 0.4% |
| 0.06700422878 | 3 | 0.4% |
| 0.0141987194 | 3 | 0.4% |
| 0.03062619354 | 3 | 0.4% |
| 0.02039068965 | 3 | 0.4% |
| Other values (556) | 680 |
| Value | Count | Frequency (%) |
| -0.2644865148 | 1 | |
| -0.1891045091 | 1 | |
| -0.1753062219 | 1 | |
| -0.1437549036 | 1 | |
| -0.1381133021 | 1 | |
| -0.1363926249 | 1 | |
| -0.1268111669 | 1 | |
| -0.1252231448 | 1 | |
| -0.1165338163 | 1 | |
| -0.1133926874 | 1 |
| Value | Count | Frequency (%) |
| 0.1492817027 | 1 | |
| 0.1280413499 | 1 | |
| 0.1279533643 | 1 | |
| 0.1175163334 | 2 | |
| 0.1147562373 | 1 | |
| 0.1075082077 | 1 | |
| 0.1070590723 | 1 | |
| 0.1056204819 | 1 | |
| 0.102917534 | 1 | |
| 0.09785240017 | 1 |
| Distinct | 510 |
|---|---|
| Distinct (%) | 71.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.001815165953 |
| Minimum | -0.1666450774 |
|---|---|
| Maximum | 0.1683916513 |
| Zeros | 2 |
| Zeros (%) | 0.3% |
| Negative | 340 |
| Negative (%) | 47.9% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.1666450774 |
|---|---|
| 5-th percentile | -0.04390486804 |
| Q1 | -0.01529131954 |
| median | 0.0009995003331 |
| Q3 | 0.02014570209 |
| 95-th percentile | 0.04797070899 |
| Maximum | 0.1683916513 |
| Range | 0.3350367287 |
| Interquartile range (IQR) | 0.03543702163 |
Descriptive statistics
| Standard deviation | 0.03009384567 |
|---|---|
| Coefficient of variation (CV) | 16.57911533 |
| Kurtosis | 2.955098793 |
| Mean | 0.001815165953 |
| Median Absolute Deviation (MAD) | 0.01787131921 |
| Skewness | 0.1197920192 |
| Sum | 1.288767826 |
| Variance | 0.0009056395471 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.001299155732 | 5 | 0.7% |
| 0.02673929719 | 4 | 0.6% |
| -0.01399750964 | 4 | 0.6% |
| -0.01146547811 | 4 | 0.6% |
| 0.003095204907 | 4 | 0.6% |
| -0.006219299814 | 4 | 0.6% |
| -0.004108428045 | 3 | 0.4% |
| 0.01901800584 | 3 | 0.4% |
| -0.01075765665 | 3 | 0.4% |
| -0.0005001250417 | 3 | 0.4% |
| Other values (500) | 673 |
| Value | Count | Frequency (%) |
| -0.1666450774 | 1 | |
| -0.1055827626 | 1 | |
| -0.08675686393 | 1 | |
| -0.08414276811 | 1 | |
| -0.07558598696 | 1 | |
| -0.07181828779 | 1 | |
| -0.07160341886 | 1 | |
| -0.0706370796 | 1 | |
| -0.06667413327 | 1 | |
| -0.0664603667 | 1 |
| Value | Count | Frequency (%) |
| 0.1683916513 | 1 | |
| 0.1214208552 | 1 | |
| 0.09903052346 | 1 | |
| 0.0946736136 | 1 | |
| 0.08782771037 | 1 | |
| 0.08709470685 | 1 | |
| 0.08167214864 | 1 | |
| 0.07686844426 | 1 | |
| 0.07334339422 | 1 | |
| 0.07269268539 | 1 |
| Distinct | 498 |
|---|---|
| Distinct (%) | 70.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002539667261 |
| Minimum | -0.1504741134 |
|---|---|
| Maximum | 0.1200027924 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 326 |
| Negative (%) | 45.9% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.1504741134 |
|---|---|
| 5-th percentile | -0.0418642041 |
| Q1 | -0.01397215853 |
| median | 0.002446992448 |
| Q3 | 0.01734863833 |
| 95-th percentile | 0.05260193328 |
| Maximum | 0.1200027924 |
| Range | 0.2704769057 |
| Interquartile range (IQR) | 0.03132079687 |
Descriptive statistics
| Standard deviation | 0.02958606156 |
|---|---|
| Coefficient of variation (CV) | 11.64958182 |
| Kurtosis | 2.541712089 |
| Mean | 0.002539667261 |
| Median Absolute Deviation (MAD) | 0.01578555817 |
| Skewness | -0.08993790357 |
| Sum | 1.803163755 |
| Variance | 0.0008753350384 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.008464078412 | 7 | 1.0% |
| 0.01163208423 | 4 | 0.6% |
| 0.001498876124 | 4 | 0.6% |
| 0.01734863833 | 4 | 0.6% |
| -0.0002000200027 | 4 | 0.6% |
| -0.001300845733 | 4 | 0.6% |
| 0.004290781417 | 4 | 0.6% |
| 0.01182975175 | 4 | 0.6% |
| 0.02244618883 | 4 | 0.6% |
| -0.02798803654 | 3 | 0.4% |
| Other values (488) | 668 |
| Value | Count | Frequency (%) |
| -0.1504741134 | 1 | |
| -0.1197975635 | 1 | |
| -0.1039171134 | 1 | |
| -0.1020327256 | 1 | |
| -0.08806647887 | 1 | |
| -0.08697501401 | 1 | |
| -0.08686593302 | 1 | |
| -0.0814269987 | 1 | |
| -0.07969276891 | 1 | |
| -0.0720332029 | 1 |
| Value | Count | Frequency (%) |
| 0.1200027924 | 1 | |
| 0.1176052421 | 1 | |
| 0.1161817543 | 1 | |
| 0.08277742646 | 1 | |
| 0.08075014969 | 1 | |
| 0.07973496802 | 1 | |
| 0.07955027876 | 1 | |
| 0.07871875471 | 1 | |
| 0.07853387765 | 1 | |
| 0.07352923329 | 1 |
| Distinct | 446 |
|---|---|
| Distinct (%) | 62.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002474803022 |
| Minimum | -0.2073932412 |
|---|---|
| Maximum | 0.1230137759 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 309 |
| Negative (%) | 43.5% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.2073932412 |
|---|---|
| 5-th percentile | -0.02786984355 |
| Q1 | -0.007906172524 |
| median | 0.0023971246 |
| Q3 | 0.01299025912 |
| 95-th percentile | 0.0341211896 |
| Maximum | 0.1230137759 |
| Range | 0.3304070171 |
| Interquartile range (IQR) | 0.02089643165 |
Descriptive statistics
| Standard deviation | 0.02224253239 |
|---|---|
| Coefficient of variation (CV) | 8.98759707 |
| Kurtosis | 14.42674042 |
| Mean | 0.002474803022 |
| Median Absolute Deviation (MAD) | 0.01052460425 |
| Skewness | -0.7885566537 |
| Sum | 1.757110146 |
| Variance | 0.0004947302471 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.00299550898 | 7 | 1.0% |
| 0.01291622527 | 5 | 0.7% |
| 0.002696361548 | 5 | 0.7% |
| 0.01301493708 | 5 | 0.7% |
| -0.006823225348 | 5 | 0.7% |
| 0.02019470729 | 4 | 0.6% |
| 0.0007996801706 | 4 | 0.6% |
| -0.01379471102 | 4 | 0.6% |
| 0.009257021263 | 4 | 0.6% |
| -0.004208844774 | 4 | 0.6% |
| Other values (436) | 663 |
| Value | Count | Frequency (%) |
| -0.2073932412 | 1 | |
| -0.0966210386 | 1 | |
| -0.08686593302 | 1 | |
| -0.07904320734 | 1 | |
| -0.07321606233 | 1 | |
| -0.06517872602 | 1 | |
| -0.04919024419 | 1 | |
| -0.04814037533 | 2 | |
| -0.04730127312 | 1 | |
| -0.04541586353 | 1 |
| Value | Count | Frequency (%) |
| 0.1230137759 | 1 | |
| 0.1117202496 | 1 | |
| 0.09166718853 | 1 | |
| 0.08718636168 | 1 | |
| 0.07751644243 | 1 | |
| 0.07380792714 | 1 | |
| 0.07157619849 | 1 | |
| 0.06971261241 | 1 | |
| 0.06259914176 | 1 | |
| 0.06100102156 | 1 |
| Distinct | 443 |
|---|---|
| Distinct (%) | 62.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002633402886 |
| Minimum | -0.07192573957 |
|---|---|
| Maximum | 0.08663630666 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 329 |
| Negative (%) | 46.3% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.07192573957 |
|---|---|
| 5-th percentile | -0.02691395425 |
| Q1 | -0.01005033585 |
| median | 0.000949547788 |
| Q3 | 0.01479008547 |
| 95-th percentile | 0.03614868703 |
| Maximum | 0.08663630666 |
| Range | 0.1585620462 |
| Interquartile range (IQR) | 0.02484042133 |
Descriptive statistics
| Standard deviation | 0.02029502409 |
|---|---|
| Coefficient of variation (CV) | 7.706767619 |
| Kurtosis | 1.369297904 |
| Mean | 0.002633402886 |
| Median Absolute Deviation (MAD) | 0.01256676841 |
| Skewness | 0.2012300489 |
| Sum | 1.869716049 |
| Variance | 0.0004118880029 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.003405793135 | 6 | 0.8% |
| 0.0008995952428 | 5 | 0.7% |
| 0.008959741371 | 5 | 0.7% |
| -0.0004000800213 | 5 | 0.7% |
| 0.008364916332 | 5 | 0.7% |
| -0.01207258123 | 4 | 0.6% |
| 0.004589452334 | 4 | 0.6% |
| -0.01612938193 | 4 | 0.6% |
| -0.003305457009 | 4 | 0.6% |
| -0.009545412844 | 4 | 0.6% |
| Other values (433) | 664 |
| Value | Count | Frequency (%) |
| -0.07192573957 | 1 | |
| -0.07010062768 | 1 | |
| -0.06849299645 | 1 | |
| -0.06006852647 | 1 | |
| -0.05826490813 | 1 | |
| -0.05794695995 | 1 | |
| -0.05129329439 | 1 | |
| -0.04856019062 | 1 | |
| -0.04814037533 | 1 | |
| -0.04646287441 | 1 |
| Value | Count | Frequency (%) |
| 0.08663630666 | 1 | |
| 0.08056564784 | 1 | |
| 0.07427224437 | 1 | |
| 0.0635380208 | 1 | |
| 0.06259914176 | 1 | |
| 0.06024808035 | 1 | |
| 0.0575139062 | 1 | |
| 0.05741949087 | 1 | |
| 0.05723063345 | 1 | |
| 0.05496155807 | 1 |
| Distinct | 106 |
|---|---|
| Distinct (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.003616216582 |
| Minimum | 0 |
|---|---|
| Maximum | 0.01340968691 |
| Zeros | 69 |
| Zeros (%) | 9.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.001399020914 |
| median | 0.003792798239 |
| Q3 | 0.005087039049 |
| 95-th percentile | 0.008067371078 |
| Maximum | 0.01340968691 |
| Range | 0.01340968691 |
| Interquartile range (IQR) | 0.003688018135 |
Descriptive statistics
| Standard deviation | 0.002670242841 |
|---|---|
| Coefficient of variation (CV) | 0.7384078858 |
| Kurtosis | 0.6088636049 |
| Mean | 0.003616216582 |
| Median Absolute Deviation (MAD) | 0.001743290106 |
| Skewness | 0.6509558854 |
| Sum | 2.567513773 |
| Variance | 7.130196828 × 10-6 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 69 | 9.7% |
| 9.999500033 × 10-5 | 44 | 6.2% |
| 0.004290781417 | 21 | 3.0% |
| 0.00399202127 | 21 | 3.0% |
| 0.004191204618 | 18 | 2.5% |
| 0.004589452334 | 18 | 2.5% |
| 0.003892414715 | 16 | 2.3% |
| 0.003095204907 | 16 | 2.3% |
| 0.004390348301 | 16 | 2.3% |
| 0.003693171838 | 15 | 2.1% |
| Other values (96) | 456 |
| Value | Count | Frequency (%) |
| 0 | 69 | |
| 9.999500033 × 10-5 | 44 | |
| 0.0001999800027 | 8 | 1.1% |
| 0.000299955009 | 4 | 0.6% |
| 0.0003999200213 | 2 | 0.3% |
| 0.0004998750417 | 1 | 0.1% |
| 0.000599820072 | 5 | 0.7% |
| 0.0006997551143 | 6 | 0.8% |
| 0.0007996801706 | 7 | 1.0% |
| 0.0008995952428 | 7 | 1.0% |
| Value | Count | Frequency (%) |
| 0.01340968691 | 1 | 0.1% |
| 0.01301493708 | 1 | 0.1% |
| 0.01271877241 | 1 | 0.1% |
| 0.01252128055 | 1 | 0.1% |
| 0.01232374969 | 2 | |
| 0.01202738021 | 3 | |
| 0.01143437763 | 1 | 0.1% |
| 0.01123663193 | 1 | 0.1% |
| 0.01074209653 | 1 | 0.1% |
| 0.0106431601 | 2 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| Mkt_RF | |
|---|---|
| SMB | |
| HML | |
| RMW | |
| CMA |
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 4.047887324 |
| Min length | 2 |
Characters and Unicode
| Total characters | 2874 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RMW |
|---|---|
| 2nd row | Mkt_RF |
| 3rd row | CMA |
| 4th row | RMW |
| 5th row | CMA |
Common Values
| Value | Count | Frequency (%) |
| Mkt_RF | 251 | |
| SMB | 127 | |
| HML | 124 | |
| RMW | 122 | |
| CMA | 77 | 10.8% |
| RF | 9 | 1.3% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| mkt_rf | 251 | |
| smb | 127 | |
| hml | 124 | |
| rmw | 122 | |
| cma | 77 | 10.8% |
| rf | 9 | 1.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 701 | |
| R | 382 | |
| F | 260 | 9.0% |
| k | 251 | 8.7% |
| t | 251 | 8.7% |
| _ | 251 | 8.7% |
| S | 127 | 4.4% |
| B | 127 | 4.4% |
| H | 124 | 4.3% |
| L | 124 | 4.3% |
| Other values (3) | 276 | 9.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2121 | |
| Lowercase Letter | 502 | 17.5% |
| Connector Punctuation | 251 | 8.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 701 | |
| R | 382 | |
| F | 260 | 12.3% |
| S | 127 | 6.0% |
| B | 127 | 6.0% |
| H | 124 | 5.8% |
| L | 124 | 5.8% |
| W | 122 | 5.8% |
| C | 77 | 3.6% |
| A | 77 | 3.6% |
Lowercase Letter
| Value | Count | Frequency (%) |
| k | 251 | |
| t | 251 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 251 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2623 | |
| Common | 251 | 8.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 701 | |
| R | 382 | |
| F | 260 | 9.9% |
| k | 251 | 9.6% |
| t | 251 | 9.6% |
| S | 127 | 4.8% |
| B | 127 | 4.8% |
| H | 124 | 4.7% |
| L | 124 | 4.7% |
| W | 122 | 4.7% |
| Other values (2) | 154 | 5.9% |
Common
| Value | Count | Frequency (%) |
| _ | 251 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2874 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 701 | |
| R | 382 | |
| F | 260 | 9.0% |
| k | 251 | 8.7% |
| t | 251 | 8.7% |
| _ | 251 | 8.7% |
| S | 127 | 4.4% |
| B | 127 | 4.4% |
| H | 124 | 4.3% |
| L | 124 | 4.3% |
| Other values (3) | 276 | 9.6% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| Mkt_RF | |
|---|---|
| SMB | |
| HML | |
| RMW | |
| CMA |
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 3.804225352 |
| Min length | 2 |
Characters and Unicode
| Total characters | 2701 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CMA |
|---|---|
| 2nd row | SMB |
| 3rd row | Mkt_RF |
| 4th row | CMA |
| 5th row | SMB |
Common Values
| Value | Count | Frequency (%) |
| Mkt_RF | 194 | |
| SMB | 142 | |
| HML | 141 | |
| RMW | 131 | |
| CMA | 91 | |
| RF | 11 | 1.5% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| mkt_rf | 194 | |
| smb | 142 | |
| hml | 141 | |
| rmw | 131 | |
| cma | 91 | |
| rf | 11 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 699 | |
| R | 336 | |
| F | 205 | 7.6% |
| k | 194 | 7.2% |
| t | 194 | 7.2% |
| _ | 194 | 7.2% |
| S | 142 | 5.3% |
| B | 142 | 5.3% |
| H | 141 | 5.2% |
| L | 141 | 5.2% |
| Other values (3) | 313 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2119 | |
| Lowercase Letter | 388 | 14.4% |
| Connector Punctuation | 194 | 7.2% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 699 | |
| R | 336 | |
| F | 205 | 9.7% |
| S | 142 | 6.7% |
| B | 142 | 6.7% |
| H | 141 | 6.7% |
| L | 141 | 6.7% |
| W | 131 | 6.2% |
| C | 91 | 4.3% |
| A | 91 | 4.3% |
Lowercase Letter
| Value | Count | Frequency (%) |
| k | 194 | |
| t | 194 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 194 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2507 | |
| Common | 194 | 7.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 699 | |
| R | 336 | |
| F | 205 | 8.2% |
| k | 194 | 7.7% |
| t | 194 | 7.7% |
| S | 142 | 5.7% |
| B | 142 | 5.7% |
| H | 141 | 5.6% |
| L | 141 | 5.6% |
| W | 131 | 5.2% |
| Other values (2) | 182 | 7.3% |
Common
| Value | Count | Frequency (%) |
| _ | 194 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2701 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 699 | |
| R | 336 | |
| F | 205 | 7.6% |
| k | 194 | 7.2% |
| t | 194 | 7.2% |
| _ | 194 | 7.2% |
| S | 142 | 5.3% |
| B | 142 | 5.3% |
| H | 141 | 5.2% |
| L | 141 | 5.2% |
| Other values (3) | 313 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Date | Mkt_RF | SMB | HML | RMW | CMA | RF | Best | Worst | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1963-07-01 | -0.003908 | -0.004108 | -0.009747 | 0.006777 | -0.011870 | 0.002696 | RMW | CMA |
| 1 | 1963-08-01 | 0.049457 | -0.008032 | 0.017840 | 0.003594 | -0.003506 | 0.002497 | Mkt_RF | SMB |
| 2 | 1963-09-01 | -0.015825 | -0.005214 | 0.001299 | -0.007125 | 0.002896 | 0.002696 | CMA | Mkt_RF |
| 3 | 1963-10-01 | 0.024985 | -0.013998 | -0.001001 | 0.027615 | -0.020305 | 0.002896 | RMW | CMA |
| 4 | 1963-11-01 | -0.008536 | -0.008839 | 0.017349 | -0.005113 | 0.022153 | 0.002696 | CMA | SMB |
| 5 | 1963-12-01 | 0.018135 | -0.021224 | -0.000200 | 0.000300 | -0.000700 | 0.002896 | Mkt_RF | SMB |
| 6 | 1964-01-01 | 0.022153 | 0.001299 | 0.014692 | 0.001699 | 0.014593 | 0.002996 | Mkt_RF | SMB |
| 7 | 1964-02-01 | 0.015283 | 0.002796 | 0.027712 | -0.000500 | 0.009059 | 0.002597 | HML | RMW |
| 8 | 1964-03-01 | 0.014002 | 0.012225 | 0.033435 | -0.022348 | 0.031692 | 0.003095 | HML | RMW |
| 9 | 1964-04-01 | 0.001000 | -0.015317 | -0.006723 | -0.012781 | -0.010859 | 0.002896 | RF | SMB |
Last rows
| Date | Mkt_RF | SMB | HML | RMW | CMA | RF | Best | Worst | |
|---|---|---|---|---|---|---|---|---|---|
| 700 | 2021-11-01 | -0.015621 | -0.017757 | -0.004410 | 0.069713 | 0.017250 | 0.000000 | RMW | SMB |
| 701 | 2021-12-01 | 0.030529 | -0.007730 | 0.032274 | 0.048028 | 0.043347 | 0.000100 | RMW | SMB |
| 702 | 2022-01-01 | -0.064539 | -0.041343 | 0.120003 | 0.008662 | 0.074272 | 0.000000 | HML | Mkt_RF |
| 703 | 2022-02-01 | -0.023166 | 0.029170 | 0.029947 | -0.021019 | 0.030820 | 0.000000 | CMA | Mkt_RF |
| 704 | 2022-03-01 | 0.030044 | -0.021734 | -0.018164 | -0.015723 | 0.031208 | 0.000100 | CMA | SMB |
| 705 | 2022-04-01 | -0.099378 | -0.004008 | 0.060060 | 0.035657 | 0.057514 | 0.000100 | HML | Mkt_RF |
| 706 | 2022-05-01 | -0.003406 | -0.000600 | 0.080750 | 0.014297 | 0.039028 | 0.000300 | HML | Mkt_RF |
| 707 | 2022-06-01 | -0.088066 | 0.012916 | -0.061556 | 0.018331 | -0.048140 | 0.000600 | RMW | Mkt_RF |
| 708 | 2022-07-01 | 0.091393 | 0.018527 | -0.041864 | 0.006777 | -0.071926 | 0.000800 | Mkt_RF | CMA |
| 709 | 2022-08-01 | -0.038533 | 0.014987 | 0.003095 | -0.049190 | 0.013015 | 0.001898 | SMB | RMW |